Skip to content

[SPARK-29672][PYSPARK] update spark testing framework to use python3#26330

Closed
shaneknapp wants to merge 32 commits into
apache:masterfrom
shaneknapp:remove-py27-tests
Closed

[SPARK-29672][PYSPARK] update spark testing framework to use python3#26330
shaneknapp wants to merge 32 commits into
apache:masterfrom
shaneknapp:remove-py27-tests

Conversation

@shaneknapp

@shaneknapp shaneknapp commented Oct 30, 2019

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

remove python2.7 tests and test infra for 3.0+

Why are the changes needed?

because python2.7 is finally going the way of the dodo.

Does this PR introduce any user-facing change?

newp.

How was this patch tested?

the build system will test this

@shaneknapp shaneknapp changed the title [WIP][SPARK-29672][PYSPARK] remove py27 support for jenkins tests [WIP][SPARK-29672][PYSPARK] DO NOT MERGE -- remove py27 support for jenkins tests Oct 30, 2019
@SparkQA

SparkQA commented Oct 30, 2019

Copy link
Copy Markdown

Test build #112965 has finished for PR 26330 at commit 92f7268.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Oct 30, 2019

Copy link
Copy Markdown

Test build #112966 has finished for PR 26330 at commit 7c0fe46.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun

Copy link
Copy Markdown
Member

Thank you for making this PR, @shaneknapp .
From the Jenkins log, PIP installation test seems to use Python 3.5 still. Could you take a look at that?

Testing pip installation with python 3.5
Using /tmp/tmp.MBOPq46hzq for virtualenv

@shaneknapp

Copy link
Copy Markdown
Contributor Author

Thank you for making this PR, @shaneknapp .
From the Jenkins log, PIP installation test seems to use Python 3.5 still. Could you take a look at that?

Testing pip installation with python 3.5
Using /tmp/tmp.MBOPq46hzq for virtualenv

that's actually in dev/run-pip-tests... i guess i can hack on that as it's kinda in the scope of this PR.

@shaneknapp

Copy link
Copy Markdown
Contributor Author

that's actually in dev/run-pip-tests... i guess i can hack on that as it's kinda in the scope of this PR.

btw i've never really looked too closely at run-pip-tests, so i may have a few extra commits as i mess around w/it trying to get it to use 3.6, and strip out the python2 stuff.

@SparkQA

SparkQA commented Oct 31, 2019

Copy link
Copy Markdown

Test build #112972 has finished for PR 26330 at commit f25c4b3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Oct 31, 2019

Copy link
Copy Markdown

Test build #112968 has finished for PR 26330 at commit 226c1fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp

Copy link
Copy Markdown
Contributor Author

test this please

@SparkQA

SparkQA commented Oct 31, 2019

Copy link
Copy Markdown

Test build #112977 has finished for PR 26330 at commit f25c4b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp

Copy link
Copy Markdown
Contributor Author

okie dokie, i'm confident that this change won't break anything, but i'll wait on merging until i get the word.

@HyukjinKwon @mengxr @srowen

@srowen srowen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems OK to me, though I don't know this script well.

@shaneknapp

shaneknapp commented Oct 31, 2019

Copy link
Copy Markdown
Contributor Author

Seems OK to me, though I don't know this script well.

this is @holdenk 's contribution... should have added her to the review earlier.

also, if you look at the console output (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112977/console), the following block of code in dev/run-pip-tests causes some really ugly log4j errors that i would LOVE to clean up (but i'm not entirely sure how easy that might be):

    echo "Run basic sanity check on pip installed version with spark-submit"
    spark-submit "$FWDIR"/dev/pip-sanity-check.py
    echo "Run basic sanity check with import based"
    python "$FWDIR"/dev/pip-sanity-check.py
    echo "Run the tests for context.py"
    python "$FWDIR"/python/pyspark/context.py
Run basic sanity check on pip installed version with spark-submit
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: target/unit-tests.log (No such file or directory)
	at java.io.FileOutputStream.open0(Native Method)
<SNIP>
Successfully ran pip sanity check

@holdenk

holdenk commented Oct 31, 2019

Copy link
Copy Markdown
Contributor

The code change LGTM provided it runs in Jenkins
As for the log4j errors clean up I think we can address that in a follow up. My guess is we need to set an env var and create a file but I think it's ok to move forward with this first.

@SparkQA

SparkQA commented Oct 31, 2019

Copy link
Copy Markdown

Test build #113043 has finished for PR 26330 at commit b57acd9.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp

Copy link
Copy Markdown
Contributor Author

test this please

Comment thread python/run-tests.py Outdated
@SparkQA

SparkQA commented Oct 31, 2019

Copy link
Copy Markdown

Test build #113050 has finished for PR 26330 at commit b57acd9.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp

Copy link
Copy Markdown
Contributor Author

Test build #113050 has finished for PR 26330 at commit b57acd9.

  • This patch fails PySpark unit tests.

i'll revert my pypy3 change and let the test run again... this is officially out of scope of this pr. :)

This reverts commit b57acd9.
@HyukjinKwon

Copy link
Copy Markdown
Member

Actually, @shaneknapp, I think we should remove pypy 2.5 too, which is compatible with Python 2 ... does it make things complicated?

@HyukjinKwon

HyukjinKwon commented Nov 1, 2019

Copy link
Copy Markdown
Member

Ah, tests fails with:

py4j.protocol.Py4JError: org.apache.spark.SparkConf.__name__ does not exist in the JVM

Okay, this one I think I can try to investigate it in a separate PR. I will make a followup after this PR.

@HyukjinKwon HyukjinKwon left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. There is one side issue #26330 (comment)

@shaneknapp

Copy link
Copy Markdown
Contributor Author

Actually, @shaneknapp, I think we should remove pypy 2.5 too, which is compatible with Python 2 ... does it make things complicated?

nope, it makes things easier. :)

@SparkQA

SparkQA commented Nov 1, 2019

Copy link
Copy Markdown

Test build #113059 has finished for PR 26330 at commit 0b37b71.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp

Copy link
Copy Markdown
Contributor Author

Actually, my initial thought is that we drop everything right after Spark 3.0 branch is cut out (including upgrading R version to 3.4 as well to address intermittent CRAN check failure) which is strictly correct.

However, it's not always able to match the exact timing in practice and those infer/build stuff need all your efforts and time. So TBH I just don't want to say let's hold off at this moment until 3.0.
I will leave it to you and other committers here.

imho it's better to get the test/build infra upgraded before new versions are cut... if we still want to 'unofficially' support python2.7/pypy2.5.1 right up until we release 3.0, i can add those back to python/run-tests.py.

in fact, i will do that now "just to see what happens". :)

@BryanCutler

Copy link
Copy Markdown
Member

My thought is that it's not that practical to test every PR against every Python version we support. We should focus on maybe 2 of the most widely used versions. Since we still need to support these versions, we can setup pre-release tasks to run the full tests and correct any issues that have come up. I think any issues will mostly be minor syntax problems and should be easily corrected.

@shaneknapp

Copy link
Copy Markdown
Contributor Author

My thought is that it's not that practical to test every PR against every Python version we support. We should focus on maybe 2 of the most widely used versions. Since we still need to support these versions, we can setup pre-release tasks to run the full tests and correct any issues that have come up. I think any issues will mostly be minor syntax problems and should be easily corrected.

so, which two versions would you suggest? 3.6? 2.7? some other minor version of 3?

@SparkQA

SparkQA commented Nov 13, 2019

Copy link
Copy Markdown

Test build #113722 has finished for PR 26330 at commit b5bada3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp shaneknapp self-assigned this Nov 14, 2019
@shaneknapp

shaneknapp commented Nov 14, 2019

Copy link
Copy Markdown
Contributor Author

ok, i have an idea:

how about i change the scope of this PR (and associated JIRA) to solely update the test/build infrastructure to python3. i will leave the python2.7/pypy2.5.1 test calls in place (see b5bada3) for now.

once we cut the first 3.0 release, we can decide exactly which major/minor versions of python to test against on master and update python/run-tests.py accordingly.

sound reasonable? if so, then this PR is ready to merge as-is.

@BryanCutler

Copy link
Copy Markdown
Member

so, which two versions would you suggest? 3.6? 2.7? some other minor version of 3?

If we are choosing 2 versions, I think 3.6 and maybe 3.8.

once we cut the first 3.0 release, we can decide exactly which major/minor versions of python to test against on master and update python/run-tests.py accordingly.

That sounds like a good plan if it's not too much hassle for you. I would get a little nervous about some Python 2 issue holding up the release :)

@srowen

srowen commented Nov 14, 2019

Copy link
Copy Markdown
Member

Seems OK to me if it's forward progress. But we drop it entirely after branch 3.0 is cut?

@shaneknapp

Copy link
Copy Markdown
Contributor Author

If we are choosing 2 versions, I think 3.6 and maybe 3.8.

that works... it will take a little bit of work to get a 3.8 conda env set up, but i should be able to get that sorted by the end of january 2020.

once we cut the first 3.0 release, we can decide exactly which major/minor versions of python to test against on master and update python/run-tests.py accordingly.

That sounds like a good plan if it's not too much hassle for you. I would get a little nervous about some Python 2 issue holding up the release :)

well, tests are currently passing w/python2... which is great. if we see that python2 tests are going to hold up the 3.0 release, i can just yank the python2.7 (and pypy) executables from python/run-tests.py!

Seems OK to me if it's forward progress. But we drop it entirely after branch 3.0 is cut?

yeah, it's forward progress and i will be overjoyed to drop python2.7/pypy2.5.1 after 3.0.

i'll update the PR title/desc, as well as the JIRA and then merge later today.

@shaneknapp shaneknapp changed the title [SPARK-29672][PYSPARK] remove py27 support for jenkins tests and testing framework [SPARK-29672][PYSPARK] update spark testing framework to use python3 Nov 14, 2019
@shaneknapp shaneknapp deleted the remove-py27-tests branch November 14, 2019 18:21
@shaneknapp

Copy link
Copy Markdown
Contributor Author

thanks for the reviews everyone! much appreciated... :)

@HyukjinKwon

Copy link
Copy Markdown
Member

BTW, @shaneknapp quick question. did we change default python executable from Python 2 to Python 3? I was investigating some flaky tests (https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/113824/testReport/org.apache.spark.sql.streaming.continuous/ContinuousSuite/continuous_mode_with_various_UDFs___Scalar_Pandas_UDF/history/) which uses python executable and wondering if there's any delta recently.

@dongjoon-hyun

dongjoon-hyun commented Nov 15, 2019

Copy link
Copy Markdown
Member

Hi, @shaneknapp .
Could you update the Jenkins job scripts? The following Jenkins job seems to be broken.

- ZINC_PORT=$(python -S -c "import random; print random.randrange(3030,4030)")
+ ZINC_PORT=$(python -S -c "import random; print(random.randrange(3030,4030))")

@HyukjinKwon

Copy link
Copy Markdown
Member

Oh, NVM about my comment. It was because #26133

@shaneknapp

Copy link
Copy Markdown
Contributor Author

Hi, @shaneknapp .
Could you update the Jenkins job scripts? The following Jenkins job seems to be broken.

- ZINC_PORT=$(python -S -c "import random; print random.randrange(3030,4030)")
+ ZINC_PORT=$(python -S -c "import random; print(random.randrange(3030,4030))")

yeah, i missed a couple of those. they're all fixed now.

@dongjoon-hyun

Copy link
Copy Markdown
Member

Thank you, @shaneknapp !

@AndrewLvov

AndrewLvov commented Dec 23, 2019

Copy link
Copy Markdown

Hey @HyukjinKwon , did you resolve the error
py4j.protocol.Py4JError: org.apache.spark.SparkConf.__name__ does not exist in the JVM

?
Could you please point me to changes if you did resolve it ?
I"m having the same error when trying to run PySpark using Python 3.

@HyukjinKwon

Copy link
Copy Markdown
Member

Can you file a JIRA with the reproducible steps?

dongjoon-hyun pushed a commit that referenced this pull request Jul 15, 2021
…encies.sh

### What changes were proposed in this pull request?

This PR is a followup of #26330. There is the last place to fix in `dev/test-dependencies.sh`

### Why are the changes needed?

To stick to Python 3 instead of using Python 2 mistakenly.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested.

Closes #33368 from HyukjinKwon/change-python-3.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Jul 15, 2021
…encies.sh

### What changes were proposed in this pull request?

This PR is a followup of #26330. There is the last place to fix in `dev/test-dependencies.sh`

### Why are the changes needed?

To stick to Python 3 instead of using Python 2 mistakenly.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested.

Closes #33368 from HyukjinKwon/change-python-3.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 6bd385f)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Jul 15, 2021
…encies.sh

### What changes were proposed in this pull request?

This PR is a followup of #26330. There is the last place to fix in `dev/test-dependencies.sh`

### Why are the changes needed?

To stick to Python 3 instead of using Python 2 mistakenly.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested.

Closes #33368 from HyukjinKwon/change-python-3.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 6bd385f)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request Jul 15, 2021
…encies.sh

### What changes were proposed in this pull request?

This PR is a followup of #26330. There is the last place to fix in `dev/test-dependencies.sh`

### Why are the changes needed?

To stick to Python 3 instead of using Python 2 mistakenly.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested.

Closes #33368 from HyukjinKwon/change-python-3.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 6bd385f)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…encies.sh

### What changes were proposed in this pull request?

This PR is a followup of apache#26330. There is the last place to fix in `dev/test-dependencies.sh`

### Why are the changes needed?

To stick to Python 3 instead of using Python 2 mistakenly.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Manually tested.

Closes apache#33368 from HyukjinKwon/change-python-3.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 6bd385f)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants